First things first, let’s go ahead and reach into our library and load all necessary packages, namely tidycensus.
Getting Started
tidycensus is an extremely useful package if you’re someone who likes to work with US Census data. If you’ve ever explored the census.gov website, you’ll know the their downloadable datasets are not exactly readable, especially in R.
Tidycensus allows you to search for variables, build clean datasets, and even use the information to make great visualizations, like maps! Today, our task is to visualize income inequality between Native Americans and White Americans in the US. To do so, we’ll analyze median household income across the 50 states and create a map (using the mapview package) displaying this variable.
First, you’ll need to install and/or load your Census API key. You can learn how to do so here.
Loading Variables for Analysis: Native Americans
Now, it’s time to explore the different variables you can use in your analysis. To do so, you’ll use the load_variables() function and input the year and type of Census survey. We’ll be pulling from the 2021 ACS5 in this example. This is a big dataset, so I find it’s easier to use View() than head() here.
# A tibble: 6 × 4
name label concept geogr…¹
<chr> <chr> <chr> <chr>
1 B01001A_001 Estimate!!Total: SEX BY AGE (WHITE… tract
2 B01001A_002 Estimate!!Total:!!Male: SEX BY AGE (WHITE… tract
3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE (WHITE… tract
4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE (WHITE… tract
5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE (WHITE… tract
6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE (WHITE… tract
# … with abbreviated variable name ¹geography
For now, I want to focus on Native Americans’ income data, so I’ll use the following variables: median household income for Native Americans (B19013C_001) and aggregate household income for Native Americans (B19025C_001).
I’m combining these in a vector and assigning them to a variable called acs_vars. Within the vector, I gave the variables more readable names for analysis purposes.
Now, because we are going to make maps to visualize income inequality across the US, we’ll need to create a dataset that breaks median household income down by state and includes the shapefile info needed to generate a map in mapview.
We do this by once again using the get_acs() function, whose arguments include geography, variables, output, and geometry. Here’s a breakdown of what these mean:
geography = “state”: we pull data at the state-level
variables = c(acs_vars): we use the variables we pulled previously (median_income, aggregate_income) in our dataset
output = “wide”: This makes data easier to read by pivoting wide
geometry = TRUE: This includes all shapefile data necessary to make a map
Code
# pull for US statesus_native_income <-get_acs(geography ="state",variables =c(acs_vars),output ="wide",geometry =TRUE)
Getting data from the 2017-2021 5-year ACS
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Looking at the first few rows, we can see if there are rows for Native Americans’ estimated median household income and the margin of error by seeing if there’s a trailing E or M. Since we are only concerned with the estimates, we’ll remove the margin of error columns, AKA any column ending with “M”.
For cosmetic reasons, we’ll place a $ in front of the median income estimates with the paste() function. I’m making this a new variable because, as we’ll see later, it’ll only be used to present information.
Making a Map: Native Americans’ Median Household Income in the US
Now for the exciting part: mapmaking! There are a couple components that we’ll use to make this map look very nice. One is a popup. Popups show up when you click on specific states on the map. We want to display the state name and Native Americans’ median household income in that state, so we’ll use the glue package to do the following:
Code
mylabel <- glue::glue("<strong>{us_native_income$NAM}</strong><br /> Median Native American Household Income: {us_native_income$median_income_signed}") %>%lapply(htmltools::HTML)# NOTE: this function utilizes HTML syntax. For now, you just need to know to include the dataset$variable name you want to pull from. NAM is the state's name, and median_income_signed is the median income with the dollar sign.
Now for our actual map! We’ll use mapview(). - The first argument is our dataframe - zcol = “median_income”, which is the column we are pulling from. - at = seq() sets the legend between $20,000 and $120,000. I am manually setting a sequence because I want to compare this map with White Americans’ median income, and that would be difficult if both maps aren’t on the same scale. - col.regions = RColorBrewer::brewer.pal(9, “PuBuGn”) allows me to use an RColorBrewer palette to color my map. - popup = mylabel allows me to use the label I mentioned previously
Making a popup to show state name and median household income:
Code
mylabel2 <- glue::glue("<strong>{us_white_income$NAM}</strong><br /> Median White American Household Income: {us_white_income$median_income_signed}") %>%lapply(htmltools::HTML)
Right off the bat, we can see that Native Americans do not make as much as White Americans. By setting the maps to the same scales, we are able to tell as much from the colors alone. No state shows Native Americans having a median household income of $80,000 or above.
While these maps are already proving helpful in discovering income inequality between these two groups, we might benefit more by seeing these maps side-by-side. To do so, we’ll use the sync() function: